Reordering with Source Language Collocations
نویسندگان
چکیده
This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods.
منابع مشابه
Collocational Processing in Two Languages: A psycholinguistic comparison of monolinguals and bilinguals
With the renewed interest in the field of second language learning for the knowledge of collocating words, research findings in favour of holistic processing of formulaic language could support the idea that these language units facilitate efficient language processing. This study investigated the difference between processing of a first language (L1) and a second language (L2) of congruent col...
متن کاملWord Alignment-Based Reordering of Source Chunks in PB-SMT
Reordering poses a big challenge in statistical machine translation between distant language pairs. The paper presents how reordering between distant language pairs can be handled efficiently in phrase-based statistical machine translation. The problem of reordering between distant languages has been approached with prior reordering of the source text at chunk level to simulate the target langu...
متن کاملA Reordering Approach for Statistical Machine Translation
This paper presents a Markov based hierarchical reordering scheme for lexical reordering to incorporate into phrase-based statistical machine translation system. The goal is to reorder the words and phrases in source language syntactic structure into their corresponding target language syntactic order for making translation easy. Without reordering during language translation, sentences can onl...
متن کاملA Direct Syntax-Driven Reordering Model for Phrase-Based Machine Translation
This paper presents a direct word reordering model with novel syntax-based features for statistical machine translation. Reordering models address the problem of reordering source language into the word order of the target language. IBM Models 3 through 5 have reordering components that use surface word information but very little context information to determine the traversal order of the sour...
متن کاملSyntactic Preprocessing for Statistical Machine Translation
We describe an approach to automatic source-language syntactic preprocessing in the context of Arabic-English phrase-based machine translation. Source-language labeled dependencies, that are word aligned with target language words in a parallel corpus, are used to automatically extract syntactic reordering rules in the same spirit of Xia and McCord (2004) and Zhang et al. (2007). The extracted ...
متن کامل